Biostatistics For Dummies (Monika Wahi John Pezzullo)

a is the intercept, which is the value of Y when

b is the slope, which is the amount Y changes when X increases by 1.

In straight-line regression, our goal is to develop the best-fitting line for our data. Using least-squares

as a guide, the best-fitting line through a set of data is the one that minimizes the sum of the squares

(SSQ) of the residuals. Residuals are the vertical distances of each point from the fitted line, as shown

in Figure 16-2.

© John Wiley & Sons, Inc.

FIGURE 16-2: On average, a good-fitting line has smaller residuals than a bad-fitting line.

For curves, finding the best-fitting curve is a very complicated mathematical problem. What’s nice

about the straight-line regression is that it’s so simple that you can calculate the least-squares

parameters from explicit formulas. If you’re interested (or if your professor insists that you’re

interested), we present a general outline of how those formulas are derived.

Think of a set of data containing

and

, in which i is an index that identifies each observation in the

set, as described in Chapter 2. From those data, SSQ can be calculated like this:

If you’re good at first-semester calculus, you can find the values of a and b that minimize SSQ by

setting the partial derivatives of SSQ with respect to a and b equal to 0. If you stink at calculus, trust

that this leads to these two simultaneous equations:

where N is the number of observed data points.

These equations can be solved for a and b: